
* set working dir
cd XXXX
set more off

* --------------------- IMPORT AND SAVE ORDER A --------------------- *

* import qualtrics csv file
* Note: 1) first row is used as a variable name, 2) varnames converted to lower case for uniformity of variable names
import delimited "INCPREF - March 2018 Order A_June 12, 2018_15.21.csv", bindquote(strict) varnames(1) case(lower) clear

* --------------------- USE SECOND ROW AS LABELS --------------------- *

foreach x of varlist * {
  label var `x' `"`=`x'[1]'"'
       }
* drop second row after importing
drop in 1
foreach x of varlist * {
  cap destring `x', replace
       }
desc
* drop third row with qualtrics import information
drop in 1

* format choices as numeric

destring *, replace

* save as stata file
save "INCPREF - March 2018 Order A_June 12, 2018_15.21.dta", replace

* --------------------- IMPORT AND SAVE ORDER B --------------------- *

* import qualtrics csv file
* Note: 1) first row is used as a variable name, 2) varnames converted to lower case for uniformity of variable names
import delimited "INCPREF - March 2018 Order B_June 12, 2018_15.22 2.csv", bindquote(strict) varnames(1) case(lower) clear 

* --------------------- USE SECOND ROW AS LABELS --------------------- *

foreach x of varlist * {
  label var `x' `"`=`x'[1]'"'
       }
* drop second row after importing
drop in 1
foreach x of varlist * {
  cap destring `x', replace
       }
desc
* drop third row with qualtrics import information
drop in 1

* format choices as numeric

destring *, replace

* save as stata file
save "INCPREF - March 2018 Order B_June 12, 2018_15.22 2.dta", replace

* --------------------- APPEND THE TWO DATASETS ---------------------*

use "INCPREF - March 2018 Order A_June 12, 2018_15.21.dta", clear
append using "INCPREF - March 2018 Order B_June 12, 2018_15.22 2.dta", generate(orderb)
la var orderb "Order B"

* --------------------- RENAMING AND FORMATTING ---------------------*


* add unique id

gen unique_id=_n



* fix the irregularity in variables names: 
* for some reason unlike q1_q1r,...,q6_q6r, the variable for q7 with ranges is named q7r_q5r

rename q7r_q5r_* q7r_q7r_*



* stardartize the varnames similarly to the pilot data: e.g. "q1_q1_12" becomes "choiceq112" (a bit less readable, but then script can be used with minimal adjustments)
* note that the control variables with names q57,...,q76 are left as is

forvalue i=1(1)7{
	rename q`i'_q`i'* choiceq`i'*
}
forvalue i=5(1)7{
	rename q`i'r_q`i'r*_1 choiceq`i'r*
}

* get rid of "_"
rename choice*_* choice**


* Each of (standard) questions q1,q2,q3,q4 has 17 rows, but the last row is marked as 21. 
* I think it is easier to rename them and have local nrows in the generate_variables.do for each question

rename choiceq1*21 choiceq1*17
rename choiceq2*21 choiceq2*17
rename choiceq3*21 choiceq3*17
rename choiceq4*21 choiceq4*17


save "cleaned.dta", replace
